Search CORE

80 research outputs found

On the approximate maximum likelihood estimation for diffusion processes

Author: Chang Jinyuan
Chen Song Xi
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2011
Field of study

The transition density of a diffusion process does not admit an explicit expression in general, which prevents the full maximum likelihood estimation (MLE) based on discretely observed sample paths. A\"{\i}t-Sahalia [J. Finance 54 (1999) 1361--1395; Econometrica 70 (2002) 223--262] proposed asymptotic expansions to the transition densities of diffusion processes, which lead to an approximate maximum likelihood estimation (AMLE) for parameters. Built on A\"{\i}t-Sahalia's [Econometrica 70 (2002) 223--262; Ann. Statist. 36 (2008) 906--937] proposal and analysis on the AMLE, we establish the consistency and convergence rate of the AMLE, which reveal the roles played by the number of terms used in the asymptotic density expansions and the sampling interval between successive observations. We find conditions under which the AMLE has the same asymptotic distribution as that of the full MLE. A first order approximation to the Fisher information matrix is proposed.Comment: Published in at http://dx.doi.org/10.1214/11-AOS922 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Munich RePEc Personal Archive

CiteSeerX

Crossref

Principal component analysis for second-order stationary vector time series

Author: Chang Jinyuan
Guo Bin
Yao Qiwei
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 12/04/2017
Field of study

We extend the principal component analysis (PCA) to second-order stationary vector time series in the sense that we seek for a contemporaneous linear transformation for a

p

-variate time series such that the transformed series is segmented into several lower-dimensional subseries, and those subseries are uncorrelated with each other both contemporaneously and serially. Therefore those lower-dimensional series can be analysed separately as far as the linear dynamic structure is concerned. Technically it boils down to an eigenanalysis for a positive definite matrix. When

p

is large, an additional step is required to perform a permutation in terms of either maximum cross-correlations or FDR based on multiple tests. The asymptotic theory is established for both fixed

p

and diverging

p

when the sample size

n

tends to infinity. Numerical experiments with both simulated and real data sets indicate that the proposed method is an effective initial step in analysing multiple time series data, which leads to substantial dimension reduction in modelling and forecasting high-dimensional linear dynamical structures. Unlike PCA for independent data, there is no guarantee that the required linear transformation exists. When it does not, the proposed method provides an approximate segmentation which leads to the advantages in, for example, forecasting for future values. The method can also be adapted to segment multiple volatility processes.Comment: The original title dated back to October 2014 is "Segmenting Multiple Time Series by Contemporaneous Linear Transformation: PCA for Time Series

arXiv.org e-Print Archive

LSE Research Online

High dimensional stochastic regression with latent factors, endogeneity and nonlinearity

Author: Chang Jinyuan
Guo Bin
Yao Qiwei
Publication venue: 'Elsevier BV'
Publication date: 29/03/2015
Field of study

We consider a multivariate time series model which represents a high dimensional vector process as a sum of three terms: a linear regression of some observed regressors, a linear combination of some latent and serially correlated factors, and a vector white noise. We investigate the inference without imposing stationary conditions on the target multivariate time series, the regressors and the underlying factors. Furthermore we deal with the endogeneity that there exist correlations between the observed regressors and the unobserved factors. We also consider the model with nonlinear regression term which can be approximated by a linear regression function with a large number of regressors. The convergence rates for the estimators of regression coefficients, the number of factors, factor loading space and factors are established under the settings when the dimension of time series and the number of regressors may both tend to infinity together with the sample size. The proposed method is illustrated with both simulated and real data examples

arXiv.org e-Print Archive

Crossref

LSE Research Online

High dimensional generalized empirical likelihood for moment restrictions with dependent data

Author: Chang Jinyuan
Chen Song Xi
Chen Xiaohong
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

This paper considers the maximum generalized empirical likelihood (GEL) estimation and inference on parameters identified by high dimensional moment restrictions with weakly dependent data when the dimensions of the moment restrictions and the parameters diverge along with the sample size. The consistency with rates and the asymptotic normality of the GEL estimator are obtained by properly restricting the growth rates of the dimensions of the parameters and the moment restrictions, as well as the degree of data dependence. It is shown that even in the high dimensional time series setting, the GEL ratio can still behave like a chi-square random variable asymptotically. A consistent test for the over-identification is proposed. A penalized GEL method is also provided for estimation under sparsity setting

arXiv.org e-Print Archive

Munich RePEc Personal Archive

University of Melbourne Institutional Repository

Marginal empirical likelihood and sure independence feature screening

Author: Chang Jinyuan
Tang Cheng Yong
Wu Yichao
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 06/11/2013
Field of study

We study a marginal empirical likelihood approach in scenarios when the number of variables grows exponentially with the sample size. The marginal empirical likelihood ratios as functions of the parameters of interest are systematically examined, and we find that the marginal empirical likelihood ratio evaluated at zero can be used to differentiate whether an explanatory variable is contributing to a response variable or not. Based on this finding, we propose a unified feature screening procedure for linear models and the generalized linear models. Different from most existing feature screening approaches that rely on the magnitudes of some marginal estimators to identify true signals, the proposed screening approach is capable of further incorporating the level of uncertainties of such estimators. Such a merit inherits the self-studentization property of the empirical likelihood approach, and extends the insights of existing feature screening methods. Moreover, we show that our screening approach is less restrictive to distributional assumptions, and can be conveniently adapted to be applied in a broad range of scenarios such as models specified using general moment conditions. Our theoretical results and extensive numerical examples by simulations and data analysis demonstrate the merits of the marginal empirical likelihood approach.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1139 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Estimation of subgraph density in noisy networks

Author: Chang Jinyuan
Kolaczyk Eric D.
Yao Qiwei
Publication venue
Publication date: 30/06/2020
Field of study

While it is common practice in applied network analysis to report various standard network summary statistics, these numbers are rarely accompanied by uncertainty quantification. Yet any error inherent in the measurements underlying the construction of the network, or in the network construction procedure itself, necessarily must propagate to any summary statistics reported. Here we study the problem of estimating the density of an arbitrary subgraph, given a noisy version of some underlying network as data. Under a simple model of network error, we show that consistent estimation of such densities is impossible when the rates of error are unknown and only a single network is observed. Accordingly, we develop method-of-moment estimators of network subgraph densities and error rates for the case where a minimal number of network replicates are available. These estimators are shown to be asymptotically normal as the number of vertices increases to infinity. We also provide confidence intervals for quantifying the uncertainty in these estimates based on the asymptotic normality. To construct the confidence intervals, a new and non-standard bootstrap method is proposed to compute asymptotic variances, which is infeasible otherwise. We illustrate the proposed methods in the context of gene coexpression networks

arXiv.org e-Print Archive